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ABSTRACT 



This analysis addresses two important questions regarding 
appropriate uses of the National Assessment of Educational Progress (NAEP) 
statewide average test score comparisons. First is whether NAEP state test 
score averages provide meaningful and valid comparisons of the relative 
educational quality or proficiency of specific states. The other question is 
what NAEP state test score comparisons really mean. These questions are 
addressed through data from the 1996 NAEP and the 1992 NAEP State Trial 
Assessments in Mathematics for eighth graders. Analyses indicate that 89% of 
the state differences in the NAEP-92 Trial State Assessment test score 
averages can be explained by variations in four demographic variables over 
which schools have no control. This suggests that rather than measuring 
differences in the quality or proficiency of the state’s educational 
programs, the NAEP-92 Trial State Assessment appear to reflect differences in 
what might be called the difficulty of the educational tasks or challenges 
facing the states. Where the differences in NAEP state test score averages 
are found to correlate highly with certain student demographic variables, 
such correlations should not be used to expect less learning from children in 
adverse circumstances. Instead, these findings should be rough indicators of 
the need for appropriate resources and instructional support to help 
students. In addition to the demographic influences on NEAP state assessment 
score averages, certain nondemographic factors, such as nonresponse bias, may 
influence NAEP state test scores substantially, materially affecting state 
rankings and state comparisons. This is an additional reason for not 
considering state test score averages as indicators of the relative quality 
of proficiency of state educational programs. (Contains 10 figures and 20 
references.) (SLD) 
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What Do NAEP State Comparisons and Rankings 

Really Mean? 



This analysis will address two important questions regarding appropriate uses of the National 
Assessment of Educational Progress statewide average test score comparisons. 

1) Do the NAEP state test score averages provide meaningful and valid comparisons 
of the relative educational quality, or proficiency, of specific states? 

2) What do NAEP state test score comparisons really mean? 

In addressing these two questions, this paper will draw from data collected in both the NAEP 
1996 and the NAEP 1992 State Trial Assessments in Mathematics. Where available at the time 
the paper was in process, NAEP-96 data were used. Because only the general results of the 
NAEP 1996 Mathematics Report Card for the Nation and the States were available when the 
paper was in preparation and important NAEP-96 state student demographic data had yet to be 
published, it was necessary to draw from the previous statistical analysis of NAEP-92 data made 
by Robinson and Brandon (1994) titled: NAEP Test Scores: Should They Be Used to Compare 
and Rank State Educational Quality?. It is reasonable to assume, however, that the general 
relationships between student demographic characteristics and NAEP state mathematics test 
score averages found in the analysis of NAEP-92 data will closely match relationships to be 
found using NAEP-96 data. 

Writing in 1991 concerning the initial NAEP-90 Trial State Assessment, Daniel Koretz, then 
Senior Social Scientist at the RAND corporation, stated: 

To infer that a difference between two states on the NAEP reflects specific policies or 
practices, one needs to be able to reject with reasonable confidence other plausible 
explanations, such as economic or demographic differences (1991, 20). 

In the NAEP-92 Trial State Assessment reports, it is assumed that differences in state math test 
score averages reflected significant differences in the relative quality or proficiency of the 
states’ educational programs. Before assuming that such state variations in NAEP-92 Trial State 
Assessment math scores do reflect meaningful differences in state educational policies, 
programs, practices, or proficiencies, it is important to examine student demographic variables 
to see whether or not such variables provide plausible explanations for state differences in 
student NAEP test score averages. 

It should be noted that this analysis does not address any of the other possible uses or purposes 
of NAEP national or state assessments. 
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Data Sources and Focus of the Analysis 



While both 4th and 8th graders participated in the 1992 NAEP Trial State Assessment, only 
data on 8th graders are used in this analysis because their self-reported answers to demographic 
questions are expected to be more accurate than those of 4th graders. 

The 1992 NAEP includes information on several student-related factors. However, this analysis 
will deal with only four demographic factors — three factors as reported by NAEP-92 8th-grade 
test takers in each of the participating states, plus one demographic factor as reported by the 
U.S. Bureau of the Census for each of the states. 

The reasons for concentrating on these four demographic factors are: 1) research on at-risk 
students typically emphasizes demographic factors; 2) educational policy makers and concerned 
citizens are more likely to attribute objectivity to demographic data; and 3) other research on 
the NAEP and other large-scale assessments have found links between demographic variables 
and student test scores. (See Cooley 1993; Drazen 1992; Lapointe et al. 1992; Lazer 1992, 
Wolf 1992; Pallas et al. 1989.) 

The data used in the analysis for each state represent the proportion of the tested students in the 
state having a particular characteristic. For example, 19 percent of 8th-grade students 
participating in NAEP-92 in California reported living in a disadvantaged urban community (see 
Figure 10, page 22). 

Student Demographic Variables Used in the Analysis 

The student demographic variables (that is, population characteristics or factors) reported by 8th- 
grade NAEP test takers and used in the analysis include indicators of: 

1 . number of parents living at home 

2. parent(s)’ educational background 

3 . community type. 

Figure 1 on page 3 lists these three demographic variables and shows the different categories or 
levels by which each variable is reported in the NAEP-92 data tables (NCES 1993, pp. 73, 83, 
702). An asterisk indicates categories or levels of each variable that were negatively correlated 
with student math scores and were used in this analysis, as shown later in Figure 2 on page 4. 

The fourth demographic variable used in the analysis and also shown in Figure 1 is the 1992 state 
poverty rate for children ages 5-17 in each state provided by the U. S. Bureau of the Census 
(1993). 
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Figure 1 — Demographic Variables Used in this Analysis, Showing Different 
Categories Reported in NAEP-92 Mathematics Assessment 


1. Number of Parents Living at 
Home: 

Both parents at home 
One parent at home* 

Neither parent at home* 


3. Community Type: 
Advantaged urban 
Disadvantaged urban* 
Extreme rural 
Other community type 


2. Parent(s)’ Educational 
Background: 

Don't know 

Not high school graduate* 
High school graduate only* 
Some education after high 
school 

College graduate 


4. State Poverty Rate, 

Ages 5-17: 

These are 1992 state poverty 
rates as reported by the U.S. 
Bureau of the Census. 



* Categories or levels of variable used In this analysis. 

SOURCE: National Center for Education Statistics 1993; U.S. Bureau of the Census 1993. 



Student Demographic Variables Associated with Lower Scores on the 1992 
Math NAEP 

Each level or category of the demographic variables shown in Figure 1 above was correlated with 
the state mean scores from the 1992 NAEP mathematics test. Those demographic variable 
categories having a negative correlation with NAEP state-level math scores were used in this 
analysis and are shown in Figure 2 on page 4. Categories of the variables having positive 
correlations with student math scores, such as “parent(s) college graduate,” are not shown in 
Figure 2 and were not used in this analysis. 

Figure 2 also shows the statistical values for “R” and “R 2 ” for each of the variable levels listed. 
The “R” indicates the direction (either positive or negative) of the relationship in the correlation, 
which in these cases is negative for all of the variables listed— for example, -.84 for students living 
in a home with only one parent. The “R 2 ” (which is the square of the coefficient of correlation 
“R”) is the variance and describes the percentage of variability among the states’ NAEP-92 
mathematics scores that can be predicted, or accounted for, using each of the variables shown. 

For example, 71 percent of the variation in the 1992 state NAEP mathematics test scores can be 
predicted using only data on percentage of test takers in a state living in a home with only one 
parent (-.84 x -.84 = .7056 = 71%). Likewise, 56 percent of the state differences in NAEP-92 
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mathematics test scores can be predicted by using the single variable of state poverty rotes for the 
population ages 5-17 years. 



Figure 2.— Demographic Variable Categories Associated Negatively with State-Level 
Performance on the 1992 NAEP 8th Grade Mathematics Assessment: 2 
Correlation (R) and Percentage of Variability Explained by Each Variable (R ) 


Variable Categories or Levels 


Correlation (R) 


Percentage of 
Variability Explained 
by Variable (R 2 ) 


Neither parent living at home 


-.86 


74% 


One parent living at home 


-.84 


71% 


State 1992 poverty rate, ages 5-17 


-.75 


56% 


Parents) not high school graduate 


-.68 


46% 


Living in a disadvantaged urban 
community 


-.55 


30% 


Parents) high school graduate only 


-.46 


21% 


Total Variability Explained by All Demographic Variables in Combination 2 89% 



includes variables with statistically significant negative correlations at the .01 level of testing. Calculated using 
StatView SE+Graphics software. 

2 This is a combined variable effect, not the sum of the variables listed. 



Combined Effects Explain 89 Percent of State Variation 

When the effects of the demographic variable categories shown in Figure 2 above are combined 
so as to account for any overlapping effects, 89 percent of the differences in NAEP-92 
mathematics average scores for the 42 participating states (includes D.C.) can be predicted by the 
combined effects of state variations in the four demographic variables. The 89 percent combined 
effect shown in Figure 2 was calculated by using a multiple regression equation in which the 
demographic variables were used as “x’s” to predict the “y” of NAEP mathematics score averages 
in 1992. (See Technical Note on page 23.) 

Note that this is not a summative procedure. If it were, the total variability would add to more 
than 100 percent. Rather, the equation used in calculating the combined effect takes into account 
where the variables do and do not overlap each other in predicting the test score for each state. 
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Figure 3 on page 6 shows the national average student NAEP-92 math score for each of the 
variable categories used in this analysis. Variables with negative correlations are associated with 
NAEP mathematics test scores that are lower than the national average student proficiency score 

of 266. 

As described above, the categories of the four demographic factors included in Figure 2, in 
combination, explain 89 percent of the variation in state NAEP-92 math test scores. To illustrate 
the extent of the influence of the four demographic variables on state mean NAEP math scores in 
1992, Figure 4 on page 7 plots two lines. One is the actual NAEP scores and the other is the 
predicted NAEP scores using the combined effects of the demographic variables shown in 

Figure 2. 

Observe how closely the predicted scores match the actual scores. This demonstrates how closely 
each state’s four combined demographic factors predict the state’s NAEP mathematics scores for 

1992. 

Figure 5 on page 8 presents in tabular form the same actual and predicted state average scores 
shown in graphic form in Figure 4. The states are listed alphabetically for ease in comparing the 
actual and predicted scores and the actual and predicted rankings for each state. The data 
indicate that in only 12 of the 42 states did the predicted score vary more than 3 score points from 
the state’s actual NAEP-92 mathematics average test score. 

Figure 6 on page 9 shows the same data contained in Figure 5, but with the states listed according 
to the rank order of their average scores in mathematics on the NAEP-92 Trial State Assessment. 
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Figure 3. — State Average Student Scores on 1992 NAEP 8th Grade Mathematics 
Assessment, Percent of Students in Demographic Categories, and Correlation with State 
Average Scores for Categories of Demographic Variables 


Variable 

Variable Category or Level 


Student Average 
NAEP Scores 


Percent of 
Students 


Correlation 
with State 
Averages 


1 . Number of Parents Living at Home 








Both parents 


274 


76% 


. 86 1 


One parent* 


260 


21% 


-.84 


Neither parent* 


246 


3 % 


-.86 


2. Parent(s)' Educational Background 








Don’t know 2 


251 


9% 


-.40 


Did not finish high school* 


248 


8% 


-.68 


High school graduate only* 


257 


24 % 


-.46 


Some education after high school 


270 


18% 


.38 


College graduate 


280 


42% 


.71 


3. Community Type 








Advantaged urban 


288 


10% 


.14 


Disadvantaged urban* 


238 


9% 


-.55 


Extreme rural 


267 


9% 


.53 


Other 


268 


72% 


.02 


4. State Poverty Rate (Ages 5-17)* 


— 


20% 


-.75 


National 42-State Average in 1 992 


266 


— 


— 



1 Bold = Statistically significant correlation at the .01 level of significance. 

2 Dropped in this analysis as a norvresponsive variable. 

* Indicates category or level used in this analysis. 

SOURCE: National Center for Education Statistics 1993, pp. 37,71, 81, 700. 
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Figure 5.— Alphabetical Listing of Actual 1992 NAEP 8th Grade State Average Mathematics Scores and 
State Rankings, Plus State Predicted Scores Based on Combined Effects of Four Demographic Variables 



Actual 1992 


Predicted 1992 


State 


Score 


Ranking 


State 


Score 


Ranking 


Alabama 


251 


39 


Alabama 


258 


34 


Arizona 


265 


23 


Arizona 


262 


27 


Arkansas 


255 


38 


Arkansas 


258 


34 


California 


260 


29 


California 


260 


29 


Colorado 


272 


12 


Colorado 


272 


11 


Connecticut 


273 


11 


Connecticut 


273 


9 


Delaware 


262 


27 


Delaware 


264 


26 


Dist. of Columbia 


234 


42 


Dist. of Columbia 


230 


42 


Florida 


259 


31 


Florida 


261 


28 


Georgia 


259 


31 


Georgia 


256 


38 


Hawaii 


257 


37 


Hawaii 


265 


23 


Idaho 


274 


8 


Idaho 


273 


9 


Indiana 


269 


17 


Indiana 


270 


16 


Iowa 


283 


1 


Iowa 


276 


4 


Kentucky 


261 


28 


Kentucky 


260 


29 


Louisiana 


249 


40 


Louisiana 


254 


40 


Maine 


278 


4 


Maine 


272 


11 


Maryland 


264 


25 


Maryland 


266 


20 


Mass. 


272 


12 


Mass. 


271 


14 


Michigan 


267 


18 


Michigan 


266 


20 


Minnesota 


282 


3 


Minnesota 


280 


3 


Mississippi 


246 


41 


Mississippi 


249 


41 


Missouri 


270 


16 


Missouri 


265 


23 


Nebraska 


277 


6 


Nebraska 


275 


6 


New Hampshire 


278 


4 


New Hampshire 


274 


8 


New Jersey 


271 


14 


New Jersey 


271 


14 


New Mexico 


259 


31 


New Mexico 


259 


31 


New York 


266 


22 


New York 


267 


19 


North Carolina 


258 


34 


North Carolina 


258 


34 


North Dakota 


283 


1 


North Dakota 


282 


2 


Ohio 


267 


18 


Ohio 


268 


18 


Oklahoma 


267 


18 


Oklahoma 


265 


23 


Pennsylvania 


271 


14 


Pennsylvania 


272 


11 


Rhode Island 


265 


23 


Rhode Island 


269 


17 


South Carolina 


260 


29 


South Carolina 


256 


38 


Tennessee 


258 


34 


Tennessee 


259 


31 


Texas 


264 


25 


Texas 


258 


34 


Utah 


274 


8 


Utah 


283 


1 


Virginia 


267 


18 


Virginia 


266 


20 


West Virginia 


258 


34 


West Virginia 


259 


31 


Wisconsin 


277 


6 


Wisconsin 


275 


6 


Wyoming 


274 


8 


Wyoming 


276 


4 



SOURCE; National Center for Education Statistics 1993, p. 37. U. S. Bureau of Census 1993. Calculations by ERS. 
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Fiaure 6 - Rank Order Listing of Actual 1992 NAEP 8th Grade State Average Mathematics Scores 
and Predicted State Average Scores Based on Combined Effects of Four Demograph.c Vanables 



Actual 1992 


Predicted 1992 


State 


Score 


Ranking 


State 


Score 

OftO 


Ranking 

1 


Iowa 

North Dakota 


283 

283 


1 

1 


Utah 

North Dakota 


Z.OO 

282 


2 


Minnesota 


282 


3 


Minnesota 280 o 


New Hampshire 


278 


4 f 


Wyoming 276 4 


Maine 


278 


4 


owa 276 4 


Wisconsin 


277 


6 1 


Wisconsin 275 6 


Nebraska 


277 


6 


Nebraska 


275 


o 


Utah 


274 


8 


Mew Hampshire 274 o 


Wyoming 


274 


8 


daho 


273 


9 


Idaho 


274 


8 


Connecticut 273 y 


Connecticut 


273 


11 


Maine 272 1 1 


Massachusetts 


272 


12 


Pennsylvania 272 1 1 


Colorado 


272 


12 


Colorado 272 1 1 


New Jersey 


271 


14 


New Jersey 


271 


14 


Pennsylvania 


271 


14 


Massachusetts 271 14 


Missouri 


270 


16 


Indiana 270 16 


Indiana 


269 


17 


Rhode Island 


269 


17 


1 1 1 vl 1 v4 1 V Vi 

Ohio 


267 


18 


Ohio 


268 


18 


Michigan 


267 


18 


New York 


267 


19 


Oklahoma 


267 


18 


Michigan 


266 


20 


Virginia 


267 


18 


Virginia 


266 


20 


New York 


266 


22 


Maryland 


266 


20 


1 lw if 1 l\ 

Arizona 


265 


23 1 


Missouri 


265 


23 


Rhode Island 


265 


23 


Hawaii 


265 


23 


Maryland 


264 


25 


Oklahoma 


265 


23 


Texas 


264 


25 


Delaware 


264 


26 


Delaware 


262 


27 


Arizona 


262 


27 


Kentucky 


261 


28 


Florida 


261 


28 


California 


260 


29 


California 


260 


29 


South Carolina 


260 


29 


Kentucky 


260 


29 


Georgia 


259 


31 


West Virginia 


259 


31 


New Mexico 


259 


31 


Tennessee 


259 


31 


Florida 


259 


31 


New Mexico 


259 


31 


Tennessee 


258 


34 


North Carolina 


258 


34 


West Virginia 


258 


34 


Arkansas 


258 


34 


North Carolina 


258 


34 


Texas 


258 


34 


Hawaii 


257 


37 


Alabama 


258 


34 


Arkansas 


255 


38 


Georgia 


256 


38 


Alabama 


251 


39 


South Carolina 


256 


38 


Louisiana 


249 


40 


Louisiana 


254 


40 


Mississippi 


246 


41 


Mississippi 


249 


41 


Dist. of Columbia 


234 


42 


Dist. of Columbia 


230 


42 



SOURCE: National Center for Education Statistics 1993. U. S. Bureau of the Census 1993. Calculations by ERS, 
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Observations About Rank Order Data 



There are a number of interesting observations about the rank order data contained in Figure 6. 
The actual NAEP-92 state average scores ranged from a low of 234 to a high of 283-a mere 49- 
point spread (out of a test total of 500 points) among the 42 states (includes D.C.) that 
participated in the 1992 NAEP state trial tests. Twenty-nine of the 42 states had NAEP average 
scores that were the same as one or more other states, and as many as 4 states had identical 
scores. Moreover, 20 states had only a 10-point spread from 257 to 267 out of a maximum of 500 
points. Such compact clustering of test scores makes state rankings virtually meaningless. 

To illustrate the potential harm that could come from states being ranked on such compact data, 
especially over time, notice in Figure 6 that Ohio, Michigan, Oklahoma, and Virginia all have an 
average math score of 267 and therefore are tied for the rank of 18 among the 42 states. A shift of 
a single point would have dropped any one of the four states to rank 2 1 ; this shift, or even greater 
shifts, could have easily happened merely by chance or by non-school-related causes. For 
example, in Ohio, the standard error of the 267 mean (which is 1.5 score points) indicates that 
there is about a 50/50 chance that Ohio could have ranked 21 rather than 18-or about the chance 
of a toss of a coin. 

Variations in State Participation 

Because participation in the NAEP state assessments is voluntary on the part of individual states, 
the number of states participating in each assessment could change from one assessment to the 
next. This in-and-out possibility could cause substantial fluctuations in a state’s rankings that were 
unrelated to any changes in the state’s educational programs or practices. The NAEP-96 data on 
state participation in their rank order shown in Figure 7 on page 1 1 indicates that a tremendous 
amount of state variation in participation and reporting actually occurred between the NAEP-92 
and NAEP-96 administrations of the 8th grade math assessments. 

Figure 8 on page 12 shows that of the 42 states (including D.C.) with average math scores 
reported for NAEP-92, six did not have state average scores reported for NAEP-96. Moreover, 
of the 41 states (including D.C.) with average scores reported for NAEP-96, five states did not 
have state averages reported for NAEP-92. Thus, for only 36 states of the total 50 states and 
D.C. were state average scores reported for both the 1992 and 1996 administrations of the NAEP 
mathematics assessments, an under-reporting of 29 percent (NCES 1997, p.30). 
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Figure 7-Rank Order Listing of Actual 1996 NAEP 8th Grade State Average 
Mathematics Scores 



State 


Average Score 


State 


Average Score 


North Dakota 


284 


Virginia 


270 


Maine 


284 


Maryland* 


270 


Minnesota 


284 


Rhode Island 


269 


Iowa* 


284 


Arizona 


268 


Montana* 


283 


North Carolina 


268 


Wisconsin* 


283 


Delaware 


267 | 


Nebraska 


283 


Kentucky 


267 


Connecticut 


280 


West Virginia 


265 


Vermont 


279 


Florida 


264 


Alaska* 


278 


Tennessee 


263 


Massachuetts 


278 


California 


263 


Michigan* 


277 


Georgia 


262 


Utah 


277 


Hawaii 


262 


Oregon 


276 


New Mexico 


262 


Washington 


276 


Arkansas* 


262 


Colorado 


276 


South Carolina* 


261 


Indiana 


276 


Alabama 


257 


Wyoming 


275 


Louisiana 


252 


Missouri 


273 


Mississippi 


250 


New York* 


270 


District of Columbia 


233 


Texas 


270 







♦Indicates jurdiction did not satisfy one or more of the guidelines for school participation rates in 1996 
SOURCE: National Center for Education Statistics 1997. 
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Figure 8-Participation of States in NAEP 1992 and NAEP 1996 
Assessments of Mathematics in 8th Grade 





NAEP-92 


NAEP-96 


Total states with 
scores reported 


42 

States & D.C. 


41 

States & D.C. 


States in 1992 
but not in 1996 


Six states & 
average scores 
NH-278 
ID-274 
NJ-271 
PA-271 
OH-267 
OK-267 




States in 1996 
but not in 1992 




Five states & 
average scores 
MT-283 
VT-283 
AK-278 
WA-276 
OR-276 



Sources: National Center for Education Statistics. 1993 and 1997. 



The possibilities of the enormous impact of variations in state participation on state comparative 
rankings are demonstrated dramatically in the case of the four states previously mentioned-Ohio, 
Michigan, Oklahoma, and Virginia-all with a score of 267 on the NAEP-92 math assessment and 
all sharing rank 18. In the NAEP-96 assessment two of the states-Ohio and Oklahoma-did not 
participate. Virginia’s NAEP-96 average score increased to 270 resulting in the rank of 20th. But 
once again, Virginia shared its score and rank with three other states all with average scores of 
270- Maryland, New York, and Texas. 

The NAEP-96 average score reported for Michigan jumped 10 score points to 277 (a score it 
shared with Utah) to rank number 12. However, there was an important footnote at the bottom of 
the page stating that Michigan “did not satisfy one or more of the guidelines for school 
participation rates in 1996.” It should also be noted that Maryland, New York, and seven other 
states shared this same cautionary footnote (NCES 1997, p.30). 
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Figure 8 shows that there was a 1 6-point spread in state average NAEP test scores among the 1 1 
states that participated in one, but not both, of the NAEP-92 and NAEP-96 mathematics 
assessments. This 16-point spread-from 267 to 283-included 50 percent of the state average 
score reported for NAEP-92 and 59 percent of the state average scores reported for NAEP-96. 

The impact of such huge variations in participation rates both among and within states from one 
NAEP assessment to another makes trend comparisons in state rankings based on NAEP state 
average test scores not only meaningless but also misleading and potentially harmful. 

Major Effects of Non-Response Bias 

The major problems created by non-response and the ways non-response bias can greatly affect 
state average scores on the NAEP, and therefore affect state rankings, was addressed early on by 
the Panel on the Evaluation of the NAEP Trial State Assessment Project appointed by the 
National Academy of Education. In its 1993 report, the NAE Panel analyzed the problem of 
differences in the initial participation rates of different states — that is, the percentage of schools 
from the initial sample in the state that agree to participate in the NAEP Trial State Assessment. 
The NAE Panel found that: 

In 1990, there were only two states with initial school participation rates below 85 percent; 
in the 1992 TSA, one-third of the states were in this category. The lowest 1992 participation 
rates were for Maine, which recruited only 62 percent of its originally sampled schools in 
grade 8 and 58 percent in grade 4.... 

The potential seriousness of the low initial participation rates in some states is underscored 
by the Panel’s finding that higher performance on the assessment was associated with lower 
initial participation rates. This finding was replicated in both the fourth- and eighth-grade 
samples, suggesting that the result was not due to chance. Furthermore, the Panel’s analyses 
suggest that the finding was not due to one or two aberrant states. Thus, there is a concern 
that states with low initial participation rates might have inflated results on NAEP, and the 
Panel finds some of the states’ initial participation rates to be too low for accurate reporting 
of their 1992 TSA results (emphasis in original] (National Academy of Education 1993, 

100 ). 

Such major non-response problems place in doubt the accuracy of current rankings and 
comparisons of the relative quality or proficiency of state educational programs based on NAEP- 
92 Trial State Assessment math score averages. In addition, the possibility of major fluctuations in 
initial school participation from one NAEP assessment to the next raises important concerns about 
the reliability of NAEP state test score averages as accurate and meaningful measures of changes 
in state rankings and comparisons over time. 

Variations in Average Scores for Student Subgroups 

Another way of viewing the impact of demographic variation on NAEP state test score averages 
is to examine the national average scores for subgroups of students reported for NAEP-96, 



shown in Figure 9 below. For example, note the 28 point difference (252 vs. 280) in the average 
scores for students eligible for free or reduced-price school lunch (used as an indicator of 
poverty) and students not eligible. Again, note the 28 point difference in the national average of 
scores for students whose parents did not finish high school (254 points) and the national average 
for students whose parents graduated from college (282), another factor related to income level. 



Figure 9— Variations in NAEP-96 Nationwide Average Mathematics 
Scores Reported for 8th Grade Student Subgroups 



Student Subgroup 


National Average 


Standard Error 








All Students 


272 


1.1 


Free/Reduced-Price Lunch 






Eligible 


252 


1.5 


Not Eligible 


280 


1.4 








Parents’ Hiehest Education Level 






Did Not Finish High School 


254 


1.8 


Graduated From High School 


261 


1.2 


Some Educ. After High Sch. 


279 


1.4 


Graduated From College 


282 


1.5 


I Don’t Know 


254 


1.6 



Sources: National Center for Education Statistics. 1997. 



Using the NAEP-96 data in Figure 9, consider two hypothetical states-A and B-both with equal 
quality and proficiency of their educational programs and both with students eligible for free or 
reduced-price school lunch scoring at the national average of 252 points and both with students 
not eligible scoring at the national average of 280 points. State A, however, has 50 percent of its 
students eligible for free and reduced-price lunch and 50 percent not eligible. State B has only 20 
percent of its students eligible and 80 percent not eligible. These percentages of eligibility are 
within the range reported for states in the NAEP-96 assessment. 

Given these conditions, the NAEP test score average for state A would be only 266 points while 
the test score average for state B would be 274. Obviously, the difference in the two states’ 
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average NAEP test scores does not measure differences in the performance or quality of the two 
states’ educational systems. 

Other Supporting Analyses 

The major findings of this analysis are supported by similar findings of other analysts. William W. 
Cooley, director of Pennsylvania Educational Policy Studies at the University of Pittsburgh, in an 
analysis of using only three demographic variables — percent of residents not high school 
graduates, percent of students in poverty, and percent of single-parent homes — on NAEP-92 Trial 
State math scores, found that: 

Over 75 percent of the state variations in the math means can be explained by the nature 
of the populations being served by the schools in those states. Therefore, one clearly 
cannot use NAEP math score comparisons to make accurate inferences about the relative 
quality of the math programs in these 42 states (Cooley 1993, 29). 

Richard M. Wolf, of the Department of Measurement, Evaluation, and Statistics, Teachers 
College, Columbia University, in analyzing the predictive effects of economic and demographic 
variables on the NAEP-90 Trial State math test scores, concluded: 



The evidence presented here clearly indicates that such differences cannot be rejected. If 
three readily available indicators, two that reflect general characteristics of a state s 
population and one that reflects a measure of a state’s wealth, can predict average state 
NAEP performance so well, then what policy relevance can be obtained from state-by- 
state comparisons on the NAEP tests? (Wolf 1992, 12). 

Relationship of Demographic Variables to District-by-District Comparisons 

William W. Cooley has also examined the relationship of demographic variables to differences in 
test score averages among school districts. Cooley analyzed the average scores of the 500 
Pennsylvania school districts on the state’s mandated Test of Essential Learning and Literacy 
Skills (TELLS). He found that three demographic variables — percent of residents not high 
school graduates, percent of students in poverty, and percent of single-parent homes yielded a 
negative multiple correlation of .78 with the school districts’ TELLS scores. He concluded. 

This means that over 60 percent of the variation in the average student performance 
among these school districts can be explained by those three simple census factors, 
leaving only about 40 percent to be explained by all other possible factors, including other 
demographic variables besides these three. 

In other words, comparing districts on such a statewide test reveals more about the 
difficulty of their educational task than about the quality of their educational program 
[emphasis added] (Cooley 1993, 28). 

Such findings regarding the strong effects of demographic variables on the variation in student 
achievement test score averages among school districts within a state are important because some 



ERIC 



15 



policy makers are urging the expanded use of NAEP data to compare and rank local school 
districts on their average NAEP test scores. 



State NAEP Score Adjustment Controversy 

To provide “fairness” in state NAEP rankings, it has been proposed that the state NAEP test 
scores be adjusted statistically “to reflect differences among the states in school resources and in 
the ethnic, economic, and other characteristics of their student populations” (Viadero 1994, 1). 

One method of adjustment would call for analyzing how a state might fare on the assessment if its 
population mirrored that of the nation as a whole. Another method would be to look at a state’s 
scores as if the nation’s population had the same demographic characteristics as the state. Other 
factors that might be used to adjust state NAEP scores include those reflecting student 
“opportunity to learn” factors, such as state differences in per pupil expenditures and other 
measures of school resources. 

Emerson J. Elliott, former Commissioner of Education Statistics, U.S. Department of Education, 
stated: 

This is an important issue, and it can’t be washed away by saying the only thing the 
statistical agency should do is report results for the overall population (Viadero 
1994, 18). 

However, critics of the idea of statistical adjustments to NAEP state scores charge that such 
changes to state scores “would implicitly concede that poor children cannot be expected to do as 
well in school as their more affluent peers” (Viadero 1994, 1). 

Edward Roeber, director of State Collaborative Programs on Assessment for the Council of Chief 
State School Officers, makes this observation about the idea of adjusting state NAEP test scores: 

Rather than saying we have standards we want all students to meet, these kinds of 
efforts literally have given us the idea that poor children cannot leam. The implied 
message is, “y° u have lots of poor children, so your scores should be lower” (Viadero 
1994, 18). 

Chester E. Finn, Jr., a former member of the National Assessment Advisory Board, declared that 
proposals to statistically adjust state NAEP test scores are: 

... probably the worst idea I’ve encountered in 10 years of closely watching NAEP. 

Once you start fiddling with the numbers you can ... show anything you like, and then 
you begin to lose public confidence (Viadero 1994, 18). 

However, Grissmer, Kirby, Berends, and Williams concluded in a RAND study of Student 
Achievement and the Changing American Family: 
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Comparisons of simple, unadjusted test scores from one year to the next or across 
different schools or districts do not provide a valid indicator of the performance o t e 
teachers, school, or school districts unless the differences in scores are very large 
compared to what might be accounted for by changing demographic or ami y 
characteristics. This is rarely the case; so, any use of unadjusted test scores to judge or 
reward teachers or schools will inevitably misjudge which teachers and schools are 
performing better. (Grissmer, et al., 1994). 

An example of the complicated and controversial problems that can occur when NAEP Trial State 
Assessment data are statistically adjusted to account for state demographic variables occurred in 
New Jersey. When the state teachers union, the New Jersey Education Association (NJEA), 
learned that the state ranked 14th in mathematics test scores among the 42 states that participated 
in the NAEP-92 math assessment (see Figure 6, page 9), NJEA officials commissioned a study in 
hopes that “the results would show that the state’s public school teachers are doing a good job” 

( Education Week 1994, 4). 

NJEA commissioned Howard Wainer, a researcher for the Educational Testing Service, to do an 
analysis independent of the ETS federal NAEP assessment program. The analysis found that when 
factors such as race and the number of limited-English-proficient students were factored in. New 
Jersey moved from rank 14 to rank 4 among 42 participating states ( Education Week 1994, 4). 

It is technically feasible to make mathematical adjustments in the NAEP Trial State Assessment 
scores that statistically take into account the effects of various factors related to the states 
NAEP score rankings. There is much evidence, however, that such adjustments to state NAEP 
test scores would be highly controversial and would probably create even more problems than the 

adjustments would solve. 

Measuring Difficulty of the Educational Challenges 

The fact that 89 percent of the state differences in NAEP-92 Trial State Assessment mathematics 
test score averages can be explained by variations in four demographic variables over which 
schools have no control raises a serious question: What are the differences in the NAEP-92 Tnal 
State math scores actually measuring? Rather than measuring differences in the quality or 
proficiency of the states’ educational programs, the NAEP-92 Trial State average math scores 
appear to more accurately reflect differences in what, as previously cited, William Cooley has 
termed the difficulty of the educational tasks confronting the various states (Cooley 1993, 28). 

Since the word tasks may imply unwanted burdens, the term difficulty of the educational 
challenges is perhaps more descriptive of what the differences in NAEP state test score averages 
appear to reflect. But whatever the term, it is abundantly clear that modest differences in NAEP 
state mathematics test score averages do not reflect real or meaningful comparative differences in 
the relative educational quality, or proficiency, of specific states. Moreover, the indications are 
that the gross comparative ranking of the states on the basis of NAEP state average test scores 
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can lead to misconceptions about the relative educational quality of specific states. But it is 
equally clear that statistical adjustments of NAEP test scores can lead to even further 
misconceptions. 

Scores Not Reliable Measures of Educational Effectiveness 

There is evidence that these same findings regarding NAEP state test score comparative rankings 
also apply to comparisons of average test scores among districts within a state, among schools 
within a district, and even among teachers within a school. There is much research that shows 
some schools and teachers who are faced with very difficult educational challenges and with low 
but rising student test scores are working to overcome enormous educational disadvantages and 
huge social neglect. These schools and teachers are producing quantities and qualities of student 
learning not currently recognized or appreciated by popular perspectives of test score 
comparisons. Some researchers have found that such effective schools and teachers are actually 
more educationally productive than are other schools and teachers where student test scores are 
high but where school and teacher efforts are augmented by strong family advantages and 
nurturing communities, and consequently, where the educational challenges are much less 
demanding. 

Evidence of such effects was found by David Grissmer et. al,. in the previously cited RAND 
study of Student Achievement and the Changing American Family : 

Indeed, the evidence provided here hints that a stronger case could be made that teachers 
and schools with large numbers of minority students may have been responsible for the 
most significant gains in test scores over the last 20 years, while family effects-not 
schools-may have been responsible for gains in nonminority scores. . .this evidence 
illustrates the possibility of dramatic changes in perspective that more detailed analyses 
can provide (Grissmer 1994, 19-20). 

Change in Perspective 

The statistical evidence is unmistakable; NAEP state test score gross averages are not measures of 
the relative differences in the proficiency of the states educational systems. The compelling 
evidence is that certain demographic and economic factors are much more accurate in predicting 
differences in NAEP state test score averages than are all other factors combined, including any 
possible differences in the educational quality or proficiencies of the various states. 

This unequivocal evidence calls for a dramatic change in the current perspective about the 
meaning and use of NAEP state test score averages. The popular concept that NAEP gross test 
score averages are reliable measures of comparative educational performance and proficiency 
about which citizens and agencies can feel either pride or concern about the relative quality of 
their state’s educational system is unwarranted and without support. Rather than being viewed as 
meaningful measures for educational accountability, NAEP gross state test score averages should 



be viewed as only rough reverse indicators of the relative quantities and nature of resources 
needed for the varied populations of students in the separate states to achieve equal levels of 

learning. 

The evidence calls for perceiving NAEP state test score averages as merely rough reflections of 
the differences in the difficulty of the educational challenges confronting the states. But the 
question immediately arises: Does this concept mean that students in adverse environments should 
be expected to meet lower academic standards than expected for other students? Absolutely not! 

The concept of the difficulty of the educational challenge merely recognizes the fact that students 
in adverse learning environments require more resources, more time, more proficient pedagogical 
skills, and more adept instructional approaches in order to overcome their disadvantages. Clearly, 
considerably more resources and teaching efforts are required if students in disadvantaged 
environments are to be provided an equal opportunity to learn and achieve at the same levels as 
advantaged students. 

It is unfair and even unethical to expect many students, for whom research clearly shows start 
with major family, community, economic, and other educational disadvantages, to achieve at the 
same rates and levels as advantaged students without providing the disadvantaged students with 
sufficient and effective additional learning resources and instructional support. This fundamental 
principle applies throughout education. It most certainly underlies inferences relating to 
comparisons and rankings based on NAEP test score averages for states or school districts. 



State Data Used in This Analysis 



The state-by-state NAEP-92 data for each of the demographic variables used in calculating the 
predicted state scores and state rankings used in this analysis are shown in Figure 10 on page 22. 



Summary and Conclusions 

This analysis has addressed the important question of whether NAEP state test score averages 
should be used for the purpose of comparing and ranking the quality of mathematics instruction 
among the various states. The analysis has also addressed the question: What do NAEP state test 
score comparisons really mean? In addressing these questions, the paper draws from data 
collected in both the NAEP 1996 and the NAEP 1992 State Trial Assessments in Mathematics for 
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addressed in the analysis. 



To accept a specific factor as a valid measure of the relative quality or proficiency of mathematics 
instruction among the states, one must be able to reject with reasonable confidence other possible 
factors influencing average test score differences, such as economic or demographic variables. 

This analysis found that 89 percent of the variation in state average test scores on the NAEP-92 
Trial State Assessment in mathematics can be explained by the combined effects of four 
demographic variables — number of parents living at home, parent(s)’ education, community 
type, and state poverty rates. This leaves only 1 1 percent of the differences among the state test 
score averages to be explained by all other variables including differences in the educational 
quality or proficiency of the various states. 

Since test score averages among the states are so strongly affected by four demographic factors 
over which schools have no control, NAEP-92 state test score averages in 8th grade mathematics 
scores are shown not to be valid measures to use for the purpose of comparing and ranking states 
according to the relative quality or proficiency of the states’ educational programs. Neither are 
similar test score averages valid measures to use for comparing and ranking school districts 
within a state according to the relative quality or proficiency of the districts’ educational 
programs. 

In view of the finding that 89 percent of the state test score differences can be explained by four 
specific demographic variables, it is important to ask: What are the differences in the NAEP-92 
Trial State math scores actually measuring? Rather than measuring differences in the quality or 
proficiency of the states’ educational programs, the NAEP state average scores were found to 
more accurately reflect differences in the difficulty of the educational challenges confronting the 
various states. 

While the differences in NAEP state test score averages are found to correlate highly with certain 
student demographic variables, such correlations should not be used as an excuse to expect less 
learning from children in adverse circumstances. Rather, these findings should be viewed as rough 
indicators of the need for appropriate resources and instructional support to help diverse 
combinations of student populations in the different states achieve equally high educational 
standards and learning levels. Analysts examining other assessment data for local school districts 
within states have also found a strong relationship between demographic variables and the 
districts’ averages of student scores on state tests. 

This analysis found that, in addition to the demographic influences on NAEP state assessment 
score averages, certain non-demographic factors, such as non-response bias — including major 
variations in state participation from one NAEP assessment to the next plus major variations in 
initial school participation rates within states-may substantially influence NAEP state test score 
averages and thus materially affect state rankings and state comparisons. The possible strong 



influence of such non-demographic variables adds an important reason for not considering or 
using NAEP state test scores averages as indicators of the relative quality or proficiency of state 
educational programs. 

Proposals have been made to adjust state NAEP scores statistically to reflect more fairly the 
effects of state variations in factors such as school resources, ethnic groups, and specific 
demographic characteristics of the student population. However, debate on this idea indicates 
clearly that such statistical adjustments would be complicated, highly controversial, and create 
more problems than they would solve. 

The findings of this analysis regarding the major impact of demographic factors on NAEP state 
assessment score averages indicate the need for a new perception of NAEP test scores that will 
focus attention, resources, and efforts toward addressing the difficulty of the educational 
challenges confronting states, districts, and schools rather than using NAEP assessment data to 
inappropriately and unfairly compare and rank states on presumed differences in the quality of 
their educational programs. 



* * * 
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Figure 10 — State Demographic Category/Level Data Used in Analysis of 1992 NAEP State 8th Grade 

Mathematics Assessment Participants 





Parents)’ Education 


Parent(s) Living at Home 


Community Poverty 


State 


Not High School 
Graduate 


High School 
Graduate Only 


One Parent 
Living at Home 


Neither Parent 
Living at Home 


Student Living 
in 

Disadvantaged 

Urban 

Community. 


State 
Poverty 
Rate, 1992 
(Ages 5-17) 


Average for 42 
States and DC 


8% 


25% 


21% 


3% 


9% 


19.6% 


Alabama 


13% 


29% 


24% 


3% 


16% 


23.5% 


Arizona 


10% 


21% 


22% 


3% 


14% 


21.9% 


Arkansas 


11% 


31% 


21% 


4% 


5% 


20.8% 


California 


10% 


17% 


22% 


4% 


19% 


22.1% 


Colorado 


6% 


21% 


21% 


2% 


10% 


10.9% 


Connecticut 


6% 


22% 


19% 


2% 


17% 


16.5% 


Delaware 


6% 


30% 


24% 


3% 


0% 


11.2% 


Dist. of Colum. 


9% 


29% 


47% 


8% 


67% 


31.8% 


Florida 


8% 


24% 


25% 


3% 


17% 


21.6% 


Georgia 


11% 


30% 


25% 


3% 


10% 


27.8% 


Hawaii 


6% 


25% 


21% 


4% 


16% 


15.9% 


I Idaho 


7% 


19% 


15% 


2% 


5% 


20.2% 


1 Indiana 


8% 


32% 


20% 


2% 


11% 


1 3.2% 


Iowa 


4% 


25% 


16% 


2% 


3% 


14.7% 


| Kentucky 


15% 


32% 


20% 


3% 


12% 


23.1% 


1 Louisiana 


10% 


30% 


25% 


4% 


19% 


32.4% 


1 — 

Maine 


6% 


26% 


17% 


2%, 


2% 


16.5% 


■ Maryland 


6% 


25% 


23% 


3% 


13% 


16.0% 


I Massachusetts 


7% 


21% 


21% 


2% 


23% 


18.2% 


i Michigan 


6% 


26% 


23% 


3% 


19% 


17.9% 


Minnesota 


3% 


22% 


14% 


1% 


0% 


17.0% 


Mississippi 


13% 


29% 


27% 


4% 


6% 


30.6% 


1 Missouri 


8% 


29% 


21% 


3% 


12% 


18.6% 


i Nebraska 


4% 


24% 


17% 


2% 


6% 


14.2% 


New Hamp 


6% 


24% 


17% 


2% 


0% 


9.3% 


New Jersey 


7% 


23% 


19% 


3% 


24% 


13.0% 


New Mexico 


11% 


26% 


22% 


3% 


6% 


27.7% 


New York 


6% 


23% 


23% 


2% 


16% 


23.4% 


North Carolina 


10% 


27% 


24% 


3% 


5% 


23.7% 


North Dakota 


3% 


19% 


13% 


1% 


0% 


12.9% 


Ohio 


7% 


32% 


23% 


2% 


17% 


18.3% 


Oklahoma 


8% 


26% 


20% 


3% 


5% 


19.4% 


Pennsylvania 


7% 


30% 


19% 


2% 


15% 


14.6% 


Rhode Island 


8% 


22% 


20% 


2% 


12% 


19.6% 


South Carolina 


9% 


31% 


23% 


4% 


6% 


27.7% 


Tennessee 


12% 


29% 


24% 


3% 


7% 


17.1% 


Texas 


16% 


21% 


22% 


3% 


18% 


23.2% 


Utah 


3% 


15% 


14% 


1% 


5% 


1 1 .2% 


Virginia 


9% 


24% 


21% 


3% 


13% 


14.3% 


West Virginia 


13% 


33% 


19% 


3% 


10% 


31 .7% 


Wisconsin 


5% 


28% 


19% 


1% 


5% 


13.8% 


Wyoming 


5% 


23% 


17% 


2% 


10% 


12.3% 



SOURCE: National Center for Education Statistics 1993. U. S. Bureau of the Census 1993. Calculations by ERS. 
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Technical Note 



Data on the percentage of students in each state falling within four categories — living 
in a disadvantaged urban community, the state poverty rate for children ages 5-17, 
parent(s) who did not graduate from high school or did not continue their education 
beyond high school, and only one parent or neither parent at home — were placed into 
a multiple regression equation with 1992 state NAEP mathematics scores as the 
dependent “y” variable to be predicted. 

The resulting equation was used to generate the predicted scores based on the state 
demographic variables. The equation generated is: 

y = 303.223+(-.758x Parents Not High School Graduates) 

+(-.01 x Parents High School Graduates Only)+(-.928 x One Parent at Home) 
+(-2.926 x Neither Parent at Home)+(. 152 x Disadvantaged. Urban Community) 
+(-.28 x State Poverty. Rate, Ages 5-17). 
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